Search CORE

53 research outputs found

Refactoring intermediately executed code to reduce cache capacity misses

Author: Beyls Kristof
D'Hollander Erik
Publication venue
Publication date: 01/01/2008
Field of study

The growing memory wall requires that more attention is given to the data cache behavior of programs. In this paper, attention is given to the capacity misses i.e. the misses that occur because the cache size is smaller than the data footprint between the use and the reuse of the same data. The data footprint is measured with the reuse distance metric, by counting the distinct memory locations accessed between use and reuse. For reuse distances larger than the cache size, the associated code needs to be refactored in a way that reduces the reuse distance to below the cache size so that the capacity misses are eliminated. In a number of simple loops, the reuse distance can be calculated analytically. However, in most cases profiling is needed to pinpoint the areas where the program needs to be transformed for better data locality. This is achieved by the reuse distance visualizer, RDVIS, which shows the intermediately executed code for critical data reuses. In addition, another tool, SLO, annotates the source program with suggestions for locality ptimization. Both tools have been used to analyze and to refactor a number of SPEC2000 benchmark programs with very positive results

Ghent University Academic Bibliography

High performance computing with FPGAs

Author: Beyls Kristof
D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2009
Field of study

Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

Ghent University Academic Bibliography

Performance visualizations using XML representations

Author: Beyls Kristof
D'Hollander Erik
Yu YJ
Publication venue
Publication date: 01/01/2004
Field of study

The intermediate representation (IR)forms the information exchanged among different passes of program compilation. The intermediate format proposed for extensibility and persistence is written in XML. In this way, the program transformations that were internal to the compiler become visible. The hierarchical structure of XML makes a natural representation for the abstract syntax tree (AST). A compiler can parse the program source into an IR, then output it as an XML document. Separated by orthogonal namespaces, other IRs are also presented in the same XML document, gathering program information such as dependence vectors, transforming matrices, iteration spaces dependence graphs and cache reuse distances. This XML document can be exchanged between the compiler and program visualizers for parallelism and locality

Ghent University Academic Bibliography

Experiences with enumeration of integer projections of parametric polytopes

Author: Beyls Kristof
Bruynooghe Maurice
Catthoor Francky
Verdoolaege Sven
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended problem (the "integer projection" of a parametric polytope), some of the variables that appear in the constraints may be existentially quantified and then the enumerated set corresponds to the projection of the integer points in a parametric polytope. This paper shows how to reduce the enumeration of the integer projection of parametric polytopes to the enumeration of parametric polytopes. Two approaches are described and experimentally compared. Both can solve problems that were considered very difficult to solve analytically

Ghent University Academic Bibliography